AITopics | optimistic mirror descent

Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

Neural Information Processing SystemsDec-24-2025, 09:32:56 GMT

We show that, for any sufficiently small fixed $\epsilon > 0$, when both players in a general-sum two-player (bimatrix) game employ optimistic mirror descent (OMD) with smooth regularization, learning rate $\eta = O(\epsilon^2)$ and $T = \Omega(poly(1/\epsilon))$ repetitions, either the dynamics reach an $\epsilon$-approximate Nash equilibrium (NE), or the average correlated distribution of play is an $\Omega(poly(\epsilon))$-strong coarse correlated equilibrium (CCE): any possible unilateral deviation does not only leave the player worse, but will decrease its utility by $\Omega(poly(\epsilon))$. As an immediate consequence, when the iterates of OMD are bounded away from being Nash equilibria in a bimatrix game, we guarantee convergence to an \emph{exact} CCE after only $O(1)$ iterations. Our results reveal that uncoupled no-regret learning algorithms can converge to CCE in general-sum games remarkably faster than to NE in, for example, zero-sum games. To establish this, we show that when OMD does not reach arbitrarily close to a NE, the (cumulative) regret of both players is not only negative, but decays linearly with time. Given that regret is the canonical measure of performance in online learning, our results suggest that cycling behavior of no-regret learning algorithms in games can be justified in terms of efficiency.

name change, optimistic mirror descent, strong coarse correlated equilibria, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

Optimization, Learning, and Games with Predictable Sequences

Neural Information Processing SystemsSep-30-2025, 13:07:55 GMT

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror-Prox algorithm, prove an extension to Holder-smooth functions, and apply the results to saddle-point type problems. Second, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O(log T / T). This addresses a question of Daskalakis et al, 2011. Further, we consider a partial information version of the problem. We then apply the results to approximate convex programming and show a simple algorithm for the approximate Max-Flow problem.

name change, optimization, predictable sequence, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

Neural Information Processing SystemsOct-11-2024, 10:56:55 GMT

We show that, for any sufficiently small fixed \epsilon 0, when both players in a general-sum two-player (bimatrix) game employ optimistic mirror descent (OMD) with smooth regularization, learning rate \eta O(\epsilon 2) and T \Omega(poly(1/\epsilon)) repetitions, either the dynamics reach an \epsilon -approximate Nash equilibrium (NE), or the average correlated distribution of play is an \Omega(poly(\epsilon)) -strong coarse correlated equilibrium (CCE): any possible unilateral deviation does not only leave the player worse, but will decrease its utility by \Omega(poly(\epsilon)) . As an immediate consequence, when the iterates of OMD are bounded away from being Nash equilibria in a bimatrix game, we guarantee convergence to an \emph{exact} CCE after only O(1) iterations. Our results reveal that uncoupled no-regret learning algorithms can converge to CCE in general-sum games remarkably faster than to NE in, for example, zero-sum games. To establish this, we show that when OMD does not reach arbitrarily close to a NE, the (cumulative) regret of both players is not only negative, but decays linearly with time. Given that regret is the canonical measure of performance in online learning, our results suggest that cycling behavior of no-regret learning algorithms in games can be justified in terms of efficiency.

bimatrix game, optimistic mirror descent, strong coarse correlated equilibria, (7 more...)

Neural Information Processing Systems

Genre: Play > Prospect (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimization, Learning, and Games with Predictable Sequences

Neural Information Processing SystemsMar-13-2024, 23:18:25 GMT

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Hölder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T) T).

algorithm, optimization, sequence, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

Optimization, Learning, and Games with Predictable Sequences

Rakhlin, Sasha, Sridharan, Karthik

Neural Information Processing SystemsFeb-14-2020, 19:26:17 GMT

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror-Prox algorithm, prove an extension to Holder-smooth functions, and apply the results to saddle-point type problems. Second, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O(log T / T). This addresses a question of Daskalakis et al, 2011. Further, we consider a partial information version of the problem.

algorithm, optimization, predictable sequence, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

On the modes of convergence of Stochastic Optimistic Mirror Descent (OMD) for saddle point problems

Ma, Yanting, Aeron, Shuchin, Mansour, Hassan

arXiv.org Machine LearningAug-2-2019

In this article, we study the convergence of Mirror Descent (MD) and Optimistic Mirror Descent (OMD) for saddle point problems satisfying the notion of coherence as proposed in Mertikopoulos et al. We prove convergence of OMD with exact gradients for coherent saddle point problems, and show that monotone convergence only occurs after some sufficiently large number of iterations. This is in contrast to the claim in Mertikopoulos et al. of monotone convergence of OMD with exact gradients for coherent saddle point problems. Besides highlighting this important subtlety, we note that the almost sure convergence guarantees of MD and OMD with stochastic gradients for strictly coherent saddle point problems that are claimed in Mertikopoulos et al. are not fully justified by their proof. As such, we fill out the missing details in the proof and as a result have only been able to prove convergence with high probability. We would like to note that our analysis relies heavily on the core ideas and proof techniques introduced in Zhou et al. and Mertikopoulos et al., and we only aim to re-state and correct the results in light of what we were able to prove rigorously while filling in the much needed missing details in their proofs.

artificial intelligence, convergence, nullg, (15 more...)

arXiv.org Machine Learning

1908.01071

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Add feedback

Online Optimization : Competing with Dynamic Comparators

Jadbabaie, Ali, Rakhlin, Alexander, Shahrampour, Shahin, Sridharan, Karthik

arXiv.org Machine LearningJan-25-2015

Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees. A complementary direction is to develop prediction methods that perform well against complex benchmarks. In this paper, we address these two directions together. We present a fully adaptive method that competes with dynamic benchmarks in which regret guarantee scales with regularity of the sequence of cost functions and comparators. Notably, the regret bound adapts to the smaller complexity measure in the problem environment. Finally, we apply our results to drifting zero-sum, two-player games where both players achieve no regret guarantees against best sequences of actions in hindsight.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Machine Learning

1501.06225

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

Optimization, Learning, and Games with Predictable Sequences

Rakhlin, Sasha, Sridharan, Karthik

Neural Information Processing SystemsDec-31-2013

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror-Prox algorithm, prove an extension to Holder-smooth functions, and apply the results to saddle-point type problems. Second, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O(log T / T). This addresses a question of Daskalakis et al, 2011. Further, we consider a partial information version of the problem. We then apply the results to approximate convex programming and show a simple algorithm for the approximate Max-Flow problem.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Filters

Collaborating Authors

optimistic mirror descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

Optimization, Learning, and Games with Predictable Sequences

Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences

On the modes of convergence of Stochastic Optimistic Mirror Descent (OMD) for saddle point problems

Online Optimization : Competing with Dynamic Comparators

Optimization, Learning, and Games with Predictable Sequences